Exploring Grapheme-to-Phoneme Induction with Machine Learning
نویسندگان
چکیده
Text-to-speech (TTS) systems have increasingly found use in the modern world. One of the subproblems of TTS is determining the phonetic structure of words, i.e., their pronunciation, from their orthography, i.e., their spelling. This is known as the grapheme-to-phoneme (G2P) problem. In all languages this is a nontrivial task, but particularly in English, a language with rich historiolinguistics that has led to an irregular and inconsistent spelling system. A single letter, even appearing in similar contexts, can be pronounced several different ways. For example, see Table 1. The most common solution to the obstacle of unpredictable pronunciations is using a phonetic dictionary with entries that look like the rows in Table 1. However, neologisms like Google in English, Wikcionario in Spanish, and Klimakatastrophe in German constantly creep into the lexicon. Keeping track of all these new words is impossible, yet native speakers can easily pronounce most neologisms at first sight. This suggests that their oththography carries enough information to determine their pronunciation. The first solutions to the G2P problem involved hand-writing sets of pronunciation rules for various languages. With the onset of the machine learning (ML) paradigm, this method became extremely outmoded.[4] In the ML version, the basic idea is to develop a set of rules which map strings of graphemes to strings of phonemes based on their orthographic context, i.e., the letters surrounding the grapheme. These rules are learned from a phonetic dictionary like [2] which can then be applied to new words outside of the training set.
منابع مشابه
Machine Learning Based English-to-Korean Transliteration Using Grapheme and Phoneme Information
Machine transliteration is an automatic method to generate characters or words in one alphabetical system for the corresponding characters in another alphabetical system. Machine transliteration can play an important role in natural language application such as information retrieval and machine translation, especially for handling proper nouns and technical terms. The previous works focus on ei...
متن کاملPhoneme-to-grapheme conversion for out-of-vocabulary words in speech recognition
In this report, we show that Out-Of-Vocabulary items (OOVs), recognized using phoneme recognition, can be reasonably reliably transcribed orthographically using Machine Learning techniques. More specifically, (i) we show baseline performance of a machine learning approach to phoneme-to-grapheme conversion when different levels of artificial noise are added (simulating phoneme recognizer errors)...
متن کاملPhoneme-to-grapheme Conversion for Out-of-vocabulary Words in Large Vocabulary Speech Recognition
In this paper, we describe a method to enhance the readability of the textual output in a large vocabulary continuous speech recognition system when out-of-vocabulary words occur. The basic idea is to replace uncertain words in the transcriptions with a phoneme recognition result that is postprocessed using a phoneme-to-grapheme converter. This converter turns phoneme strings into grapheme stri...
متن کاملOptimizing phoneme-to-grapheme conversion for out-of-vocabulary words in speech recognition
In this report, we present the results of further research on phoneme-to-grapheme (P2G) conversion for Out-Of-Vocabulary items (OOVs), recognized using phoneme recognition, in large vocabulary speech recognition. First, we summarize the results of previous research, and then we start with reporting on several optimization strategies for the Machine Learning technique we used to carry out P2G co...
متن کاملThe efficient generation of pronunciation dictionaries: machine learning factors during bootstrapping
Several factors affect the efficiency of bootstrapping approaches to the generation of pronunciation dictionaries. We focus on factors related to the underlying rule-extraction algorithms, and demonstrate variants of the Dynamically Expanding Context algorithm, which are beneficial for this application. In particular, we show that continuous updating of the learned rules, coupled with a new app...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007